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8x Despreader/Correlator Enhanced Multiplier Opcode 

Conventions: 

■ Naming: 1-bit values , both real and imaginary one-bit values are referred to as codes, and sometimes 
as PN codes. The real part and imaginary parts of a complex value is known as {real-part, imaginary 
part}, { real, img }, {1 j} , and {I,Q}; with a preference (real: img) and (I,Q) w - 

■ Code mapping: we will adopt the convention where 0->l and l-> -1. This allows us to treat the one bit 
code values as signs of 1 bit integers. This is compliant with CDMA2000 but contrary to some 
common usage. An XOR can be used to convert from this mapping to the opposite. 

■ Example the code 01 implies 0 for the real part and 1 for the imaginary part 

■ Real-img ordering: we will adopt the convention that the img part of a complex number is allocated to 
the LSB or little endian position. The motivation for this is to allow real on the left and img on the 
right when viewing 32-bit values as hex or binary displays 

■ Example the code 01 encodes 1-j; assuming img or 'j* is in lsb 

■ earliest -latest ordering: We will adopt the convention that earliest samples in time are assigned LSB 
slots. This is in line with naming samples in ascending order when written in time sequence. 
Example a time sequence of values on a port appears as - D0:D 1 :D2:D3 

■ 1-bit * 8-bit complex multiplier format: We will adopt the convention that we implement a 
mathematical complex multiply assuming the input sample are pre formatted as real+img, real-img 
pairs. Other conventions such real multiplies, and complex conjugates of the imaginary part of the 
code will require additional pre formatting of the input data, but may be implemented as opcode 
options. 

■ Example: {0,0}*{i,q}= {(I-q},(I+q) }; 



Additions: 



1-bit * 8-bit complex vector dot product definition: SUM(code[]*data[]) 



complex_l code [8] ; 
complex_8 data [8] ; 
complex_8 dotproduct ; 
for (n=0 ; n<nelm; n++) 

{ 

dotproduct += code [n] * 

} 




T data [n] ; 


2- CODE code encoding 




1-bit complex integer format - CODE.q, CODE.q 


CODE code value 


Numerical meaning 


0 


+1.0 


1 


-1.0 


3 CODE format 


1-bit complex integer format - CODEf 1 :0] 


Bit 


Numerical meaning 


CODEfl] 


1,1, or real part 


CODEfOl 


Qj or imaginary part 



4 Complex 16-bit data input format 
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y3 



16-bit complex integer format - INf3 1 :0] 


Field 


Numerical meaning 


IN[31:161 


I or real part 


IN[15:001 


Q or imaginary part 


5- Complex 8-bit data input formatO 16-bit aligned 


8-bit complex integer format — IN 


[31:0] 


Field 


Numerical meaning 


IN[23:161 


I or real part 


IN[07:00] 


q or imaginary part 


6- Dual Complex 8-bit data input format 1 16-bit aligned 


Dual 8 -bit complex integer format 


- IN[31:01 


Field 


Numerical meaning 


IN[3 1:241 


1 1 or real part sample 1 


IN[15:081 


q 1 or imaginary part, sample 1 


INr23:161 


10 or real part, sample 0 


IN[07:001 


qO or imaginary part sample 0 
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CODE formats 

Each CODE bit pair is used to drive a mux/negate unit connect to two input bytes 



OPCODE 


4XDESP8 






Packing 
in 32- 
bit word 


0 : iO : 0 : crO ■ 

- 


t "I - -j n * rrX • rrO 

*L O. • Urn \J • ^ • *^ W 




CODE bit 


Data input 


Data input 


Data intuit 


0,1 


10 [23 : 16] , 10 [07 : 00] 


10 [23 : 16] ,10 [07 : 00] 


10 [15 :08] ,10 [07*001 


2,3 


11 [23 : 16] , 11 [07 : 00] 


10 [31 : 24] , 10 [15 : 08] 


DO [15:08] , DO [07-001 


4,5 


12 [23 : 16] , 12 [07 : 00] 


11 [23 : 16] , 11 [07 : 00] 


Dl [15 : 08] , Dl [07 : 00] 


6,7. 


13 [23 :16] , 13 [07:00] 


11 [31:24] , 11 [15 :08] 


D2 [15 :08] , D2 [07 :00] 


8,9 




12 [23 : 16] , 12 [07 : 00] 


D3 [15 : 08] , D3 [07 : 00] 


10, 11 




12 [31:24] , 12 [15 :08] 


D4 [15 :08] , D4 [07 : 00] 


12, 13 




13 [23 : 16] , 13 [07 : 00] 


D5 [15 : 08] , D5 [07 : 00] 


14, 15 




13 [31:24] , 13 [15 :08] 


D6 [15 :08] , D6 [07 : 00] 


. 16, 17 


10 [23 : 16] , 10 [07 : 00] - 


10 [23 : 16] , 10 [07.: 00] 


D7 [15 :08] , D7 [07 :00] 


18, 19 


11 [23 : 16] , 11 [07 : 00] 


10 [31 : 24].-, 10 [15 : 08] 


D8 [15 : 08] ,D8 [07 : 00] 


20,21 


12 [23 : 16] , 12 [07 : 00] 


11 [23 : 16] , 11 [07 : 00] 


D9 [15 :08] , D9 [07 : 00] 


22, 23 


13 [23 : 16] , 13 [07 : 00] 


11 [31:24] ,.11 [15 :08] 


D10 [15:08] ,D10 [07 :00] 


24, 25 




12 [23 : 16] , 12 [07 : 00] 


Dll[15:08] ,D11[07:00] 


26, 27 




12 [31:24] , 12 [15 : 08] 


D12 [15:08] ,D12 [07 :00] 


28,29 




13 [23 : 16] , 13 [07 : 00] 


D13 [15:08] ,D13 [07:00] 


30,31 




13 [31:24] , 13 [15 :08] 


D14 [15 :08] ,.D14 [07 : 00] 
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CODE multiply routing 

A mux is employed to route CODE bits to the correct CODE multiply unit (the mux-negate unit) with the 
truth table below. The following table summarizes the routing to 32 mux negate units as a function of 
opcode. The 16XADD and 16XASUB opcodes could also be implemented in the mux/neg block instead of 
the mux 



CODE 

(real , img) 


mapping 


result . real 


result . img 


00 


+ 1, +1 


+r 


* 

+ i 


01 


+ 1, -1 


• 


-r 


10 


-1, +1 


-x 


+r 


11 


-1, 1 


-r 


-l 



OPCODE 


Despreader 


4XDESP 


8XDESP 


16XCorrelate 


mux 

negate 

unit 




C src 
bit 


C src 
bit 


C src ..bit 

"'" 


xOO 


TO . img 


c [0,1] 


c [0, 1] 


c [0, 1] 


xOl 


TO . img 


c [2,3] 


c [4, 5] 


c [2, 3] 


X02 


TO . img 


c[4,5] 


c[8, 9] 


c [4, 5] 


x03 


TO . img 


c[6,7] 


c [12 , 13] 


c [6, 7] 


x04 


TO . img 


— 


C [2, 3] 


c [8, 9] 


x05 


TO . img 


— 


c [6,7] 


C [10, 11] 


x06 


TO . img 


— 


c [10, 11] 


c [12 , 13] 


x07 


TO . img 


— 


C [14, 15] 


. c [14 , 15] 


y08 


TO . real 


c [0,1] 


C [0, 1] 


c [0, 1] 


y0 9 


TO . real 


c[2,3] 


c[4,5] 


c [2, 3] 


ylO 


TO . real 


c[4,5] 


c[8, 9] 


c [4, 5] 


•yll 


TO . real 


c [6,7] 


c [12 , 13] 


C[6,7] 


y!2 


TO . real 




C [2, 3] 


c [8, 9] 


yl3 


TO . real 




c [6,7] 


c [10 , 11] 


yl4 


TO . real 




c [10 , 11] 


c [12 , 13] 


yl5 


TO . real 




c [14 , 15] 


C [14 , 15] 


xl6 


Tl . img 


C [16, 17] 


C [16, 17] 


c [16, 17] 


xl7 


Tl . img 


c [18, 19] 


c [20 , 21] 


c [18 , 19] 


xl8 


Tl . img 


c [20,21] 


c [24 , 25] 


c [20, 21] 


xl9 


Tl . img 


c [22, 23] 


c [28,29] 


c [22 ,23] 


x2 0 


Tl . img 




c [18 , 19] 


C [24, 25] 


x21 


Tl . img 




c [22 , 23] 


G [26, 27] 


x2 2 


Tl . img 




C [26,27] 


c [28 , 29] 


x2 3 


Tl . img 




c [30, 31] 


c [30, 31] 


y2 4 


Tl . real 


C [16, 17] 


c [16, 17] 


c [16, 17] 


y2 5 


Tl . real 


c [18, 19] 


c [20 , 21] 


c [18 , 19] 


y2 6 


Tl . real 


C [20, 21] 


c [24 ,25] 


c [20, 21] 


y2 7 


Tl . real 


c [22 , 23] 


c [28,29] 


c [22 , 23] 


y2 8 


Tl . real 




G [18 , 19] 


c [24, 25] ' 


y2 9 


Tl . real 




c [22 , 23] 


c [26 , 27] 


y3 0 


Tl . real 




C [26,27] 


c [28, 29] 


y31 


Tl . real 




c [30 , 31] 


c [30 , 31] 
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Multiply Unit 

CODE Multiply format 

The code multiply unit multiplies 2 complex values, a 1-bit value ,jg^ow 
a the code and an 8 -bit complex pair know as data. 

This leads to the table: 

CODE (real , img) result. real= result. img= 

00 -> 1/1 data.r - data . i ; data.r + data.q; 

01 -> .1,-1 data.r + data . i ; - data.r + data.q; 

10 -> -1, 1 - data.r - data.i; data.r - data.q; 

11 -> -1,-1 - data.r + data.i; - data.r - data.q; 

For efficient implementation this is to be implemented in the 
despreader by: 

1) requiring the data to be preformatted as: data.r =r-i, data.i=r+i; 

2) using a mux followed by a negate to implement the multiply. as 
follows: 

CODE (real , img) result. real result. img 

00 -> 1, 1 r - i r + i . 

01 -> 1, -1 r + i - (r - i) 

10 -> -1, l - (r + i) r - i 

11 -> -1,-1 - (r - f). -(r + i) 

If a 45 degree rotation and scaling is allowed as is ok when pilot and 
data are decoded together, the pre -formatting can be dropped to yield 
the following function table: 

CODE (real , img) result. real result . img 



00 


- > 


1, 


l 


r 


* 

i 


01 


- > 


1, 


-l 




-(r) 


10 


- > 


-1, 


i 


- (i) 


r 


11 


- > 


-i; 


-i 


- (r) 


-(i) 



The real part is close to the UMTS encoding 

bits UMTS 

00 + i 

01 + r 
.10 - r 
11 - i 



Chameleon Systems Confidential 



Page 7 






Chameleon 

i t i r e i i . iic 



ermont Despreader / Correlator 
Specification 



Document 
Control No. 



01-003 



Revision 1.0 



Mux-negate Options 

Besides complex multiply other modes of the mux negate units are useful 

These modes are 

complex - the normal multiply 

complex-conjugate - complement the imaginary part of the code before multiply 
zero - force code to 00 to effect an adder 

real - use the real part of the code to negate the real part of the data and the img part 
the img part of the data. 

The following truth table would apply if we decided to implement these 
additional modes (assume data at mux is called real, img) 

assumming the 4 modes are adopted, we can use imput mux bits to for the source 
of the bits. 



mode 


code 


real result 


img result 


complex 


00 


real 


img 


complex 


01 


img 


-real 


complex 


10 


-img 


real 


.complex 


ifo 


-real 


-img 


complex-cnj 




real 


img 


complex-cnj 


ojo- 1 


img 


-real 


complex-cnj 




n 


- img 


real 


complex-cnj 


io 




-real 


-img 


real-r* 


Ox 


real 




real-r 


lx 


-real 




real-i** 


XO 




img 


real-i 


xl 




-img 


zero 


XX 


real 


img 



of the code to negate 



* real mode selects the real input and uses code[l] to control negation for the 
real output. 

** real mode select the img input and uses code[0] to control negation for the 
img output. . 
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Despreading 

Despreading is used in rake receivers. The common case used in both 1XRTT and UMTS is to despread a 
symbol against 2 separate codes to recover pilot and data channels. The circuit below is used as a test case 
to evaluate despreading performance for a variety of architectures. It despreads 4 16-bit or 8 8-bit T ~cdmplex 
input samples know as "chips" to form two complex results corresponding to the pilot and data outputs of a 
rake despreader. Each input is stored as 8-bit complex data which may be unpacked to 16-bit complex data. 
The input data is assumed to be stored in separate LSM memories, and is addressed in such a fashion as to 
read out a contiguous neighborhood of 4 samples separated by a 1 chip time delay. 



pilot_code 



LSMs 




data code 



LSMs 






pilot_out 






data out 



= 1 -bit complex multiply 

Vermont architecture enhancements which increase the number of chips per tile are: 

■ the address generator 

■ support for 8 -bit unpacking 

• support for addsubl6 and subaddl6 instructions. 

Vermont plus architecture enhancements which increase the number of chips per tile are: 

■ Adder tree in 2X multipliers 

■ despreader opcode in enhanced Vermont 

The data storage format for input data is important for efficiency. In some cases increased performance can 
be obtained if data is stored in memory as 16-bit data or in a redundant form of 8-bit data. The effect of 
various hw options has been summarized in table 1 and the implementations summarized in table 2. 

Table 1 - Number of chips per tile for despreading data in memory against 2 CODE codes 
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function 


data type 


CS2112 


Vermont 


Vermont 
2Xmul 


Vermont 
SBBA 


Vermont 
EX 


despread 


8-bit* 


1 


1.4 


2 . 3 




4 


despread 


16-bit 


1 


1 . 75 


3 . 5 




4 


despread 


2X8-bit** 


1-4 






3 .5 


8 


corr 


8-bit 


52 


64 


'-64 


64 


192 



* stores 8 -bit complex data in memory as 3 2 -bit word organized as:' 

+i:-q:+q:+i; 

**stores 8-bit complex data in memory as 32-bit word organized as: 
iO :q0 : il : ql , 



il :ql : i2 :q2 ; 



Table 2 - The table below detials the DPU usage for despreading modes used above: 



format 


chip 


dpuO 


dpul 


dpu2 


dpu3 


mult 


nDPU 
pilot 


nDPU 
data 


nDPU 
tot 


chip/ 
tile 


8-bit 


2112 


mem 


unpack 
negate 


swap 


tree 


— - 


4 


3 


7 


1 


8-bit 


V 


mem , 
unpack 


negate 
swap 


tree 






3 


2 


5 


1.4 


8-bit 


V2x 


mem 
unpack 


negate 
swap 






tree 


2 


1 


3 


2.3 


8-bit 


Vex 


mem 

unpack 








codemu 
It 

tree 


1 


0 


1 


4 


16-bit 


2112 


mem 


negate 


swap 


■ tree 


tree 


4 


3 


7 


1 


16-bit 


V 


mem 

negate 

swap 


tree 








2 


2 


4 


1.75 


16-bit 


V2x 


mem 
negate 
■ swap 








tree 


1 . 


1 


2 


3.5 


16-bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


4 


2X8bit 


2112 


mem 


negate 
swap 


tree 






3 


2 


5 




2X8bit 


V 

sbba 


mem 
sbba 








tree 


2 


2 


4 


3 .5 


2X8bit 


Vex 


mem 








codemu 
It 

tree 


1 


0 


1 


8 



1XRTT / UMTS Rake Receiver channel count 

Based on the despreading performance and a 150 MHz clock for Vermont, the estimated 1XRTT and 
UMTS rake receiver channel count is summarized below : 





CS2112 
DPUs 


CS2112 
channels 


Vermont 
channels 


Ve rmont 

2Xmul 

channels 


Ve rmont 
EX 

channels 


1XRTT channels 


50 


50 


75 


100 


150-200 


UMTS channel 


32 


16 


24 


32 


32-48 
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Despreading- Implementation 1 

The diagram below implements a 4 chip despreader to two different CODE codes 
din 



10 



II 



12 



13 




O0 



01 



w - 



I-Q 



H 




L 




H 




L 





4- 



8 




4- 



H 




8 



16-bit implementation of despreading opcode 



I,Q * CODE 



CODE 


0[31:16] = 


O[15:0] = 


00 


-H=- (I-Q) 


- L=- (I+Q) 


01 


-L=- (I+Q) 


H= (I-Q) 


10 


L= (I+Q) 


-H=- (I-Q) 


11 


H= (I-Q) 


L= (I+Q) 



CODE(real,img) result.real result. img 



00 -> -1,-1 -(r-i) -(r + i) 

01 ->-l, 1 -(r + i) r-i 
10 -> 1,-1 r + i -(r-i) 
11 -> 1, 1 r-i r+i 
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8-Bit - 



10 



II 




oo 



12 



13 




O0 



10 



12 




Ol 



Correlation circuits. Circuit 3 is implemented as 
the correlation opcode 



Chameleon Systems Confidential 



Page 12 






Document 
Control No. 



01-003 



ermont Despreader / Correlator 

Chameleon Specification 

i t i r e i i i i l Revision 1.0 

Despreader Trees without input delay 3 

A despreader tree can be constructed to implement dual 4-chip despreader for 16-bit data and a dual 8-chip 
despread for 8-bit data. 4 despread trees are needed, one for each 1 6-bit output field. 



Function 


Output 


Function 


Despreader TreesO 


O0[15:001 


real - i 


Despreader Trees 1 


O0[31:16] 


imaginary - q , 


Despreader Trees2 


Ol[15:00] 


real - i 


Despreader Trees3 


Oir31:161 


imaginary - q 



IB... 



y_j 



■hi 



direct 
input 



IX 



1/ 

input mux 0 



N 



1/ 

input mux 1 



input mux 2 



fN 



O 



o 



input mux 3 



iO [07 :00] 



i0 [23 : 16] 



il [07 : 00] 



il [23 :16] 



±2 [07 : 00] 



i2 [23 : 16] 



i3 [07 :00] 



i3 [23 : 16] 



i0 [15 :08] 



i0 [31:24] 



il [15 : 08] 



il [31 : 24] 



i2 [15 : 08] 



i2 [31:24] 



i3 [15 : 08] 



i3 [31:24] 



Fig 2 - Despread Tree 0 




O0 [15 : 0] 



10 v *~* / 11 V-" 7 1 

n 



10 ' ' 11 
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Despreader Trees with input delay 

Adding a separate input mux and delay chain enables a dual correlation function to be implemented with 
only one external DPU. For this mode Output O0 is the sum of C0+C1 ; 



Function 


Function 


Output - 
despread 


Output - 
correlation 


Chain input 


Chain output 


Despreader TreesO 


real - i 


O0[15:00] 


C0[15:00] 


I0[23:16],I0[7:0] 


chain[23:16],[7:0] 


Despreader Trees 1 


imaginary - q 


OOP 1:16] 


C0f31:16] 






Despreader Trees2 


real - i 


Ol[15:00] 


Cl[15:00] 


chain[23:161,[7:01 


Oiri5:001 


Despreader Trees3 


imaginary - q 


Oipi:16] 


Cl[31:16] 
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direct 
input 



input mux 0 



input mux 1 



I 



input mux 2 



o 



-o 



input mux 3 



iO [07 : 00] 



i0 [23 : 16] 



il [07 : 00] 



il [23 : 16] 



i2 [07 : 00] 



i2 [23 : 16] 



i3 [07 : 00] 



i3 [23 : 16] 




i0 [15 : 08] 



i0 [31:24] 



il [15 : 08] 



il [31 : 24] 



i2 [15 : 08] 



i2 [31 : 24] 



i3 [15 : 08] 



i3 [31 : 24] 



Fig 2 - Despread Tree 0 
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Physical Layout 





elements per output 


total 


4:1 Muxes 8-bits 


8 


32 


pipe regs 8-bit 


16 


64 


XOR 8-bit 


8 


32 


adders 8-bit 


4 


16 


adders 9-bit 


2 


8 


adders 10-bit 


3 


12 


Total blocks 


352 


1408 


Total blocks in 4: 1 mux and pipe 


192 


768 



Loads per input = 4 



1408 * 100umsq = .1408mmsq 



01[31:161 


O0f3 1:161 


1 01[31:16] 


1 O0f3 1:161 


iO 


il 


i2 


i3 




mOO 


iOl 


i02 


iOl 


mOl 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


mlO 








mil 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


m20 








m21 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 


m30 








m31 








aO 


al 


a2 


a3 


a4 


a6 


a5 


a7 



16 rows at 10 um each?? 
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Despreader integration with input and Output muxes 



Document 
Control No. 



01-003 



Revision. 1.0 



LmuxO 



Imuxl 



Imux2 



Imux3 




Outmuxl 



16-bit output of each 'real' despreader tree is packed with the corresponding 'imaginary' despreader 
pactd^rees's" " ° f ^ " ***** of T ™ 1 *° d ^ ™ T is 

addition) mode 665 ' 6 ^ ° UtPUt ""^ performed inside ^ 2x multiplier in 4 ADD 1 6(packed 16-bit 

^e d otem n d i SH eC l de 1, inSide &e des P reader is used t0 det ^ *e other operand of the final add. 
i tie operand could either be zero or 2 packed 16-bit '000 1'. 



Chameleon Systems Confidential 






Chameleon 

I ' 1 I I I I , IIC 



Vermont Despreader / Correlator 

Specification 



Correlato r integration with input and Output muxes 



Document 
Control No. 



01-003 



Revision 1.0 



ImuxO 



Imuxl 



Imux2 




Imux3 — 



Outmuxl 



32-bit chain output is added with all zero in the 2x mult before being sent to output mux 1. 
2 32-bit packed outputs CO and CI are added together before being sent to output mux 0. 



A 5-bit opcode is used in the enhanced multiplier (both 2xmult and desp/corr) for decoding 9 modes i 
2xmult and 12 mode desp/corr as shown in the following table. 



Mode 


BitKl 


Bit PI 


Bit[2] 


Bitfll 


BitfOl 


4xdesp8 complex 


1 


1 


1 


0 


0 



Chameleon Systems Confidential 



Page 1 8 






Chameleon 



* r i r e i i 



Vermont Despreader / Correlator 

Specification 



4xdesp8 comp-coniug 
4xdesp8 zero 
4xdesp8 real 



8xdesp8 complex 



8xdesp8 comp-conju g 
8xdesp8 zero 



8xdesp8 real 
Corr complex 



Corr comp-conjug 
Corr zero 



Corr real 

2MULT 

4ADD32 

4ADD16 

4MULT 

4MULTSUM 

4MULT2SUM 
4FIR 

CMULT 
CMULT16 



| Document 
Control No. 
01-003 



Revision 1.0 




There is an output register in each of the desp/corr tree 
" AflcE^? ^ bC regiStCred ^ 2 " t0 - 1 ^rambling mux. 

Ail desp/corr trees output would go to an adder in the 2xmult tfefore going to the output mux. 
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